Introduction

Definitions

Before proceeding to study the actual subject of data structures, it is necessary to make ourselves aware of the various terms used in the subject. In the present section, the basic terminologies and concepts are, therefore, described with examples to get the readers started.

Data

The term data means a value or a set of values. In this book the term data is used in the context of both singular and plural forms. The following are some examples of data:

(i) 34

(ii) 12/01/1965

(iii) ISBN 81-203-0000-0

(iv) | | |

(v) Pascal

(vi) Æ

(vii) 21, 25, 28, 30, 35, 37, 38, 41, 43

In all the above examples, some values are represented in different ways. Each value or a collection of all such values is termed data.

Entity

An entity is one that has certain attributes and which may be assigned values. For example, an employee in an organisation is an entity. The possible attributes and their corresponding values for an entity in the present example are:

Entity : EMPLOYEE

Attributes : NAME DOB SEX DESIGNATION

Values : RAVI ANAND 30/12/61 M DIRECTOR

All the employees in an organization constitute an entity set. Each attribute of an entity set has a range of values, and this range is called the domain of the attribute. A domain is, thus, the set of all possible values that could be assigned to a particular attribute. For example, in the EMPLOYEE entity, the attribute SEX has the domain {M, F}. It may be noted that an entity after values having been assigned to it, results in a composite data.

Information

The term information is used for data with its attribute(s). In other words, information can be defined as meaningful data or processed data. For example, the set of data as presented earlier becomes information related to a person if we impose the meanings as mentioned below.

Data Meaning

34 Age of the person

12/01/1965 Date of birth of the person

ISBN 81-203-0000-0 Book number, recently published by the person

| | | Number of awards equal to the tally marks achieved by

the person

Pascal Nickname of the person

Æ Signature of the person

21, 25, 28, 30, 35, 37, 38, 41, 43 Important ages of the person

Difference between data and information

From the definition of data and information, it is evident that these two are not the same, however, there is a relation between them. Figure 1.1 shows the interrelation between data and information.

As an example, suppose there is a set of data consisting of the amount of milk consumed by a person in a month. From this given set of data, information can be retrieved as follows:

(i) What was the total amount of milk consumed in the month?

(ii) On which day was the maximum amount of milk consumed?

(iii) On which day was the minimum amount of milk consumed?

(iv) What was the average amount of milk consumption per day?

(v) What was the amount of carbohydrates assimilated?

(vi) What was the amount of proteins assimilated?

(vii) What was the amount of fats assimilated?

(viii) What was the amount of minerals assimilated?

(ix) What amount of average calories gained per day?

(x) And so on.

To get all this information from the given set of data, we need to define certain processes and then apply the corresponding process on the data. (In fact, the terms data, entity and information have no standard definition in computer science and they are often used interchangeably.)

Figure 1.1 Relation between data and information.

Data type

A data type is a term which refers to the kind of data that may appear in a computation. The following are a few well-known data types:

Data Data type

34 Numeric (integer)

12/01/1965 Date

ISBN 81-203-0000-0 Alphanumeric

| | | Graphics

Pascal String

Æ Image

21, 25, 28, 30, 35, 37, 38, 41, 43 Array of integers

Real, boolean, character, complex, etc. are also some more frequently used data types.

Built-in data type

In every programming language, there is a set of data types called built-in data types. For example, in C, FORTRAN, and Pascal, the data types that are available as built-in are listed below:

C : int, float, char, double, enum, etc.

FORTRAN : INTEGER, REAL, LOGICAL, COMPLEX, DOUBLE PRECISION, CHARACTER, etc.

Pascal : Integer, Real, Character, Boolean, etc.

With the built-in data types, programming languages provide users with a lot of advantages of processing of various types of data. For example, if a user declares a variable of type Real (say), then several things are automatically implied, such as how to store a value for that variable, what are the different operations possible on that type of data, what amount of memory is required to store, etc. All these things are taken care of by the compiler or the run-time system manager.

Abstract data type

When an application requires a special kind of data which is not available as a built-in data type, then it is the programmer’s responsibility to implement his own kind of data. Here, the programmer has to specify how to store a value for that data, what are the operations that can meaningfully manipulate variables of that kind of data, amount of memory required to store a variable. The programmer has to decide all these things and accordingly implement them. Programmers’ own data type is termed abstract data type. The abstract data type is also alternatively termed user-defined data type. For example, suppose we want to process dates of the form dd/mm/yy. For this, no built-in data type is known in C, FORTRAN, and Pascal. If a programmer wants to process dates, then an abstract data type, say Date, has to be devised and various operations such as adding a few days to a date to obtain another date, finding the days between two dates, obtaining a day for a given date, etc. have to be defined accordingly. Besides these, programmers should decide how to store the data, what amount of memory will be needed to represent a value of Date, etc. An abstract data type, in fact, can be built with the help of built-in data types and other abstract data type(s) already built by the programmer. Some programming languages provide a facility to build abstract data types easily. For example, using struct/class in C/C++, and using record in Pascal, programmers can define their own data types.

Concept of Data Structures

A digital computer can manipulate only primitive data, that is, data in terms of 0’s and 1’s. Manipulation of primitive data is inherent within the computer and does not require any extra effort on the part of the user. But in our real-life applications, various kinds of data other than the primitive data are involved. Manipulation of real-life data (also termed user data) requires the following essential tasks:

1. Storage representation of user data: User data should be stored in such a way that the computer can understand it.

2. Retrieval of stored data: Data stored in a computer should be retrieved in such a way that the user can understand it.

3. Transformation of user data: Various operations which require to be performed on user data so that it can be transformed from one form to another.

The basic theory of computer science deals with the manipulation of various kinds of data, wherefrom the concept of data structures comes in. In fact, data structures constitute the fundamentals of computer science. For a given kind of user data, its structure implies the following:

1. Domain (D): This is the range of values that the data may have. This domain is also termed data object.

2. Function (F): This is the set of operations which may legally be applied to elements of the data object. This implies that for a data structure, we must specify the set of operations.

3. Axioms (A): This is the set of rules with which the different operations belonging to F can actually be implemented.

Now we can define the term data structure.

A data structure D is a triplet, that is, D = (D, F, A) where D is a set of data object, F is a set of functions and A is a set of rules to implement the functions. Let us consider an example.

We know that for the integer data type (int) in the C programming language the structure includes the following types:

D = (0, +1, +2, +3, ...)

F = (+, –, *, /, %)

A = (A set of binary arithmetics to perform addition, subtraction, division, multiplication, and modulo operations.)

It can be easily recognized that the triplet (D, F, A) is nothing but an abstract data type. Also, the elements in set D are not necessarily from primitive data; it may contain elements from some other abstract data types. Alternatively, an implementation of a data structure D is a mapping from D to a set of other data structures
D i, i = 1, 2, ..., n, for some n. More precisely, this mapping specifies how every object of D is to be represented by the objects of D i, i = 1, 2, ..., n. Every function of D must be written using the function of the implementing data structures D i, i = 1, 2, ..., n. The fact is that each of the implementing data structures is either a primitive data type or an abstract data type. We can conclude the discussion with another example.

Suppose, we want to implement a data type, namely Complex as an abstract data type. Any variable of the complex data type has two parts: a real part and an imaginary part. In our usual notation, if z is a complex number, then z = x + iy, where x and y are the real and imaginary parts, respectively. Both x and y are of the Real data type which is another abstract data type (available as a built-in data type). So, the abstract data type Complex can be defined using the data structure Real as

Complex z {

x : Real

y : Real

}

Now the set D of Complex can be realized from the domain of x and y which is Real in this case. Let us specify the set of operations for the Complex data type, which are stated as F:

F = (Å, –, Ä, ¸, Ñ, || )

Assume that z1 = x1 + iy1 and z2 = x2 + iy2 are two data of the Complex data type. Then we can define the rules for implementing the operations in F, thus giving axioms A. In the current example, for the Complex data type

A = {

z = z1 Å z2 = (x1 + x2) + i(y1 + y2) (complex addition)

z = z1 – z2 = (x1 – x2) + i(y1 – y2) (complex subtraction)

z = z1 Ä z2 = (x1 ´ x2 – y1 ´ y2) + i(x1 ´ y2 + x2 ´ y1) (complex multiplication)

z = z1 ¸ z2

(complex division)

z = Ñz1 (complex conjugate)

z = | z1 | = (complex magnitude)

}

Note that how different operations of the Complex data type can be implemented using the operations +, –, ´, /, of the implementing data structure, namely Real.

Overview of Data Structures

In computer science, several data structures are known depending on the areas of application. Out of them, a few data structures are frequently used in almost all application areas and with the help of which almost all complex data structures can be constructed. These data structures are known as fundamental data structures or classic data structures. Figure 1.2 gives a classification of all classic data structures.

Figure 1.2 Classification of classic data structures.

In addition to these classic data structures, other data structures such as lattice, Petri nets, neural nets, search graphs, semantic nets, etc. are known in various applications. These are known to be very complex data structures. (The discussion of all these complicated data structures is beyond the scope of this book.)

It can be observed from Figure 1.2 that all the classic data structures are classified into two main classes: linear data structures and nonlinear data structures. In case of linear data structures, all the elements form a sequence or maintain a linear ordering. On the other hand, no such sequence in elements exists, rather all the elements are distributed over a plane, in case of nonlinear data structures. Again within each classical data structure there are a number of variations as depicted in Figure 1.2. In this book, we have planned eight chapters to discuss eight classic data structures, one type in each chapter. As an overview, all the classic data structures discussed in this book can be seen in Figure 1.3.

Figure 1.3 Overview of classic data structures.

Implementation of Data Structures

We will implement all the classic data structures as mentioned in the previous section in two phases:

Phase 1

Storage representation: Here we will decide how a data structure can be stored in a computer memory. This storage representation, in general, is based on the use of other data structures.

Phase 2

Algorithmic notation of various operations of the classic data structure: A function for manipulating a data structure is expressed in terms of algorithms so that the details of the operation can be understood easily and the reader can implement them with the help of any programming language. It will be preferable to use C/C++ programming because of their versatile features and modern programming approaches.

In order to express the algorithm for a given operation, we will assume different control structures and notations, as shown in Figure 1.5.

Different statements that will be used in the algorithms are very much similar to the statements in the C language. For example, there are some basic constructs in the C language such as, if (condition) then (statement1) else (statement2), while (condition) do, for (initialization, condition, update) do which are considered. All statements in the algorithm are uniquely identified by their step numbers, so that each of them can be easily referred to in our subsequent discussions. The final step in an algorithm, which is a Stop statement, indicates the end of the algorithm. Sufficient comments will be introduced side by side against statements to indicate what step a statement will perform. The convention of putting comments is similar to the convention of placing comments in the C/C++ programming language. All comments are made small and precise so that they exactly signify the step(s).

Another convention that we will assume is the naming of variables. The variable names are either in capital letters or in small letters. The variable names in capital letters will be treated either as constant or as input to the algorithm. All the variable names in small letters are local to the algorithm and will be treated as temporary variables.

The call of an operation (that is a function) is represented with boldface letters, and usually already defined in the book or else defined subsequently.

Algorithm <Name of the operation>

Input: <Specification of input data for the operation>

Output: <Specification of output after the successful performance of the operation>

Remark: <If the operation assumes other data structure for its implementation or something important>

Steps:

1 . . . . . . . . . . . . . . . . .

2 If <condition> then // Comment on this step, if any is applicable

3 . . . . . . . . . . . . . . . .

4 . . . . . . . . . . . . . . . .

5 Else

6 . . . . . . . . . . . . . . . . . . .

7 . . . . . . . . . . . . . . . . . .

8 EndIf

9 . . . . . . . . . . . . . . . . . . .

10 . . . . . . . . . . . . . . . . . . .

11 While <condition> do

12 . . . . . . . . . . . . . . . . . .

13 . . . . . . . . . . . . . . . . . .

14 EndWhile

15 . . . . . . . . . . . . . . . . . .

16 . . . . . . . . . . . . . . . . . .

/* Comment on the following few steps, if any is applicable */

17 For <loop condition> do

18 . . . . . . . . . . . . . . . . . .

19 . . . . . . . . . . . . . . . . . .

20 EndFor

21 . . . . . . . . . . . . . . . . . .

22 . . . . . . . . . . . . . . . . . .

23 Stop

Figure 1.5 Format of presenting an algorithm for an operation.